Overview

Dataset statistics

Number of variables11
Number of observations699
Missing cells0
Missing cells (%)0.0%
Duplicate rows8
Duplicate rows (%)1.1%
Total size in memory60.2 KiB
Average record size in memory88.2 B

Variable types

Numeric9
Categorical2

Warnings

Dataset has 8 (1.1%) duplicate rows Duplicates
size is highly correlated with shapeHigh correlation
shape is highly correlated with sizeHigh correlation

Reproduction

Analysis started2021-02-25 09:31:09.485360
Analysis finished2021-02-25 09:31:20.717879
Duration11.23 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

ID
Real number (ℝ≥0)

Distinct645
Distinct (%)92.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1071704.099
Minimum61634
Maximum13454352
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-02-25T17:31:20.854922image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum61634
5-th percentile411453
Q1870688.5
median1171710
Q31238298
95-th percentile1333890.8
Maximum13454352
Range13392718
Interquartile range (IQR)367609.5

Descriptive statistics

Standard deviation617095.7298
Coefficient of variation (CV)0.5758079404
Kurtosis257.7171591
Mean1071704.099
Median Absolute Deviation (MAD)104381
Skewness13.67532594
Sum749121165
Variance3.808071398 × 1011
MonotocityNot monotonic
2021-02-25T17:31:21.054056image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11824046
 
0.9%
12760915
 
0.7%
11986413
 
0.4%
11145702
 
0.3%
7040972
 
0.3%
6950912
 
0.3%
13200772
 
0.3%
7336392
 
0.3%
7984292
 
0.3%
12999242
 
0.3%
Other values (635)671
96.0%
ValueCountFrequency (%)
616341
0.1%
633751
0.1%
763891
0.1%
957191
0.1%
1280591
0.1%
ValueCountFrequency (%)
134543521
0.1%
82337041
0.1%
13719201
0.1%
13710261
0.1%
13698211
0.1%

thickness
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.417739628
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-02-25T17:31:21.220206image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.815740659
Coefficient of variation (CV)0.6373713473
Kurtosis-0.6237154123
Mean4.417739628
Median Absolute Deviation (MAD)2
Skewness0.5928585327
Sum3088
Variance7.928395456
MonotocityNot monotonic
2021-02-25T17:31:21.366171image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1145
20.7%
5130
18.6%
3108
15.5%
480
11.4%
1069
9.9%
250
 
7.2%
846
 
6.6%
634
 
4.9%
723
 
3.3%
914
 
2.0%
ValueCountFrequency (%)
1145
20.7%
250
 
7.2%
3108
15.5%
480
11.4%
5130
18.6%
ValueCountFrequency (%)
1069
9.9%
914
 
2.0%
846
6.6%
723
 
3.3%
634
4.9%

size
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.134477825
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-02-25T17:31:21.510423image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.05145911
Coefficient of variation (CV)0.9735143395
Kurtosis0.09880288537
Mean3.134477825
Median Absolute Deviation (MAD)0
Skewness1.233136558
Sum2191
Variance9.3114027
MonotocityNot monotonic
2021-02-25T17:31:21.660614image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1384
54.9%
1067
 
9.6%
352
 
7.4%
245
 
6.4%
440
 
5.7%
530
 
4.3%
829
 
4.1%
627
 
3.9%
719
 
2.7%
96
 
0.9%
ValueCountFrequency (%)
1384
54.9%
245
 
6.4%
352
 
7.4%
440
 
5.7%
530
 
4.3%
ValueCountFrequency (%)
1067
9.6%
96
 
0.9%
829
4.1%
719
 
2.7%
627
3.9%

shape
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.207439199
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-02-25T17:31:21.805619image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.971912767
Coefficient of variation (CV)0.9265686995
Kurtosis0.007010980047
Mean3.207439199
Median Absolute Deviation (MAD)0
Skewness1.161859179
Sum2242
Variance8.832265496
MonotocityNot monotonic
2021-02-25T17:31:21.954646image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1353
50.5%
259
 
8.4%
1058
 
8.3%
356
 
8.0%
444
 
6.3%
534
 
4.9%
630
 
4.3%
730
 
4.3%
828
 
4.0%
97
 
1.0%
ValueCountFrequency (%)
1353
50.5%
259
 
8.4%
356
 
8.0%
444
 
6.3%
534
 
4.9%
ValueCountFrequency (%)
1058
8.3%
97
 
1.0%
828
4.0%
730
4.3%
630
4.3%

Marg
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.806866953
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-02-25T17:31:22.058837image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.855379239
Coefficient of variation (CV)1.017283429
Kurtosis0.9879470695
Mean2.806866953
Median Absolute Deviation (MAD)0
Skewness1.524468091
Sum1962
Variance8.1531906
MonotocityNot monotonic
2021-02-25T17:31:22.180035image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1407
58.2%
258
 
8.3%
358
 
8.3%
1055
 
7.9%
433
 
4.7%
825
 
3.6%
523
 
3.3%
622
 
3.1%
713
 
1.9%
95
 
0.7%
ValueCountFrequency (%)
1407
58.2%
258
 
8.3%
358
 
8.3%
433
 
4.7%
523
 
3.3%
ValueCountFrequency (%)
1055
7.9%
95
 
0.7%
825
3.6%
713
 
1.9%
622
 
3.1%

Epith
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.21602289
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-02-25T17:31:22.284652image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.214299887
Coefficient of variation (CV)0.6885211836
Kurtosis2.169066423
Mean3.21602289
Median Absolute Deviation (MAD)0
Skewness1.712171802
Sum2248
Variance4.903123988
MonotocityNot monotonic
2021-02-25T17:31:22.387136image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2386
55.2%
372
 
10.3%
448
 
6.9%
147
 
6.7%
641
 
5.9%
539
 
5.6%
1031
 
4.4%
821
 
3.0%
712
 
1.7%
92
 
0.3%
ValueCountFrequency (%)
147
 
6.7%
2386
55.2%
372
 
10.3%
448
 
6.9%
539
 
5.6%
ValueCountFrequency (%)
1031
4.4%
92
 
0.3%
821
3.0%
712
 
1.7%
641
5.9%

bare
Categorical

Distinct11
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
1
402 
10
132 
2
 
30
5
 
30
3
 
28
Other values (6)
77 

Length

Max length2
Median length1
Mean length1.188841202
Min length1

Characters and Unicode

Total characters831
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row10
3rd row2
4th row4
5th row1
ValueCountFrequency (%)
1402
57.5%
10132
 
18.9%
230
 
4.3%
530
 
4.3%
328
 
4.0%
821
 
3.0%
419
 
2.7%
?16
 
2.3%
99
 
1.3%
78
 
1.1%
2021-02-25T17:31:22.615258image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1402
57.5%
10132
 
18.9%
230
 
4.3%
530
 
4.3%
328
 
4.0%
821
 
3.0%
419
 
2.7%
16
 
2.3%
99
 
1.3%
78
 
1.1%

Most occurring characters

ValueCountFrequency (%)
1534
64.3%
0132
 
15.9%
230
 
3.6%
530
 
3.6%
328
 
3.4%
821
 
2.5%
419
 
2.3%
?16
 
1.9%
99
 
1.1%
78
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number815
98.1%
Other Punctuation16
 
1.9%

Most frequent character per category

ValueCountFrequency (%)
1534
65.5%
0132
 
16.2%
230
 
3.7%
530
 
3.7%
328
 
3.4%
821
 
2.6%
419
 
2.3%
99
 
1.1%
78
 
1.0%
64
 
0.5%
ValueCountFrequency (%)
?16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common831
100.0%

Most frequent character per script

ValueCountFrequency (%)
1534
64.3%
0132
 
15.9%
230
 
3.6%
530
 
3.6%
328
 
3.4%
821
 
2.5%
419
 
2.3%
?16
 
1.9%
99
 
1.1%
78
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII831
100.0%

Most frequent character per block

ValueCountFrequency (%)
1534
64.3%
0132
 
15.9%
230
 
3.6%
530
 
3.6%
328
 
3.4%
821
 
2.5%
419
 
2.3%
?16
 
1.9%
99
 
1.1%
78
 
1.0%

b1
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.43776824
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-02-25T17:31:22.713746image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.438364252
Coefficient of variation (CV)0.7092869798
Kurtosis0.1846213115
Mean3.43776824
Median Absolute Deviation (MAD)1
Skewness1.099969082
Sum2403
Variance5.945620227
MonotocityNot monotonic
2021-02-25T17:31:22.814803image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2166
23.7%
3165
23.6%
1152
21.7%
773
10.4%
440
 
5.7%
534
 
4.9%
828
 
4.0%
1020
 
2.9%
911
 
1.6%
610
 
1.4%
ValueCountFrequency (%)
1152
21.7%
2166
23.7%
3165
23.6%
440
 
5.7%
534
 
4.9%
ValueCountFrequency (%)
1020
 
2.9%
911
 
1.6%
828
 
4.0%
773
10.4%
610
 
1.4%

nucleoli
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.86695279
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-02-25T17:31:23.075676image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.053633894
Coefficient of variation (CV)1.065114816
Kurtosis0.4742686755
Mean2.86695279
Median Absolute Deviation (MAD)0
Skewness1.422261257
Sum2004
Variance9.324679956
MonotocityNot monotonic
2021-02-25T17:31:23.176224image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1443
63.4%
1061
 
8.7%
344
 
6.3%
236
 
5.2%
824
 
3.4%
622
 
3.1%
519
 
2.7%
418
 
2.6%
716
 
2.3%
916
 
2.3%
ValueCountFrequency (%)
1443
63.4%
236
 
5.2%
344
 
6.3%
418
 
2.6%
519
 
2.7%
ValueCountFrequency (%)
1061
8.7%
916
 
2.3%
824
 
3.4%
716
 
2.3%
622
 
3.1%

Mitoses
Real number (ℝ≥0)

Distinct9
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.589413448
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2021-02-25T17:31:23.270751image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.715077943
Coefficient of variation (CV)1.07906344
Kurtosis12.65787807
Mean1.589413448
Median Absolute Deviation (MAD)0
Skewness3.560657844
Sum1111
Variance2.941492349
MonotocityNot monotonic
2021-02-25T17:31:23.368776image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
1579
82.8%
235
 
5.0%
333
 
4.7%
1014
 
2.0%
412
 
1.7%
79
 
1.3%
88
 
1.1%
56
 
0.9%
63
 
0.4%
ValueCountFrequency (%)
1579
82.8%
235
 
5.0%
333
 
4.7%
412
 
1.7%
56
 
0.9%
ValueCountFrequency (%)
1014
2.0%
88
1.1%
79
1.3%
63
 
0.4%
56
0.9%

class
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2
458 
4
241 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters699
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2
ValueCountFrequency (%)
2458
65.5%
4241
34.5%
2021-02-25T17:31:23.581063image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-02-25T17:31:23.655032image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
2458
65.5%
4241
34.5%

Most occurring characters

ValueCountFrequency (%)
2458
65.5%
4241
34.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number699
100.0%

Most frequent character per category

ValueCountFrequency (%)
2458
65.5%
4241
34.5%

Most occurring scripts

ValueCountFrequency (%)
Common699
100.0%

Most frequent character per script

ValueCountFrequency (%)
2458
65.5%
4241
34.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII699
100.0%

Most frequent character per block

ValueCountFrequency (%)
2458
65.5%
4241
34.5%

Interactions

2021-02-25T17:31:12.185153image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:12.311180image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:12.420273image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:12.531995image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:12.641233image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:12.749800image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:12.858410image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:12.969701image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:13.081627image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:13.190378image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:13.285820image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:13.386358image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:13.486454image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:13.584128image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:13.735279image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:13.853452image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:14.022932image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:14.232800image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:14.352023image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:14.451208image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:14.553307image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:14.762306image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:14.861866image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:14.969706image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:15.065831image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:15.252304image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:15.377320image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:15.481042image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:15.580814image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:15.683199image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:15.779798image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:15.875632image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:15.973162image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.076716image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.177848image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.275485image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.371055image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.467549image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.563280image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.664387image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.760186image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.861916image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:16.958668image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:17.057678image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:17.161339image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:17.303099image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:17.450026image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:17.581752image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:17.683695image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:17.787934image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:17.884832image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:17.981094image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:18.079639image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:18.188488image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:18.283092image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:18.377729image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:18.593928image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:18.698717image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:18.797628image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:18.896668image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:18.995603image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.095958image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.200490image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.296790image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.392620image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.494744image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.593729image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.692904image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.788543image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.885057image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:19.979444image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-02-25T17:31:20.074675image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2021-02-25T17:31:23.738709image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-25T17:31:23.951488image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-25T17:31:24.186098image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-25T17:31:24.346687image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-02-25T17:31:24.479289image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-25T17:31:20.276007image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-25T17:31:20.567321image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

IDthicknesssizeshapeMargEpithbareb1nucleoliMitosesclass
010000255111213112
1100294554457103212
210154253111223112
310162776881343712
410170234113213112
510171228101087109714
6101809911112103112
710185612121213112
810330782111211152
910330784211212112

Last rows

IDthicknesssizeshapeMargEpithbareb1nucleoliMitosesclass
6896545461111211182
6906545461113211112
691695091510105454414
6927140393111211112
6937632353111212122
6947767153111321112
6958417692111211112
6968888205101037381024
69789747148643410614
69889747148854510414

Duplicate rows

Most frequent

IDthicknesssizeshapeMargEpithbareb1nucleoliMitosesclasscount
0320675335231071142
146690611112111122
270409711111121122
3110052461010281073342
4111611691010110833142
5119864131112131122
6121886011111131122
7132194251112131122